Members
Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: Research Program

Automatic Differentiation

Participants : Laurent Hascoet, Valérie Pascual, Ala Taftaf.


Glossary
automatic differentiation

(AD) Automatic transformation of a program, that returns a new program that computes some derivatives of the given initial program, i.e. some combination of the partial derivatives of the program's outputs with respect to its inputs.

adjoint

Mathematical manipulation of the Partial Derivative Equations that define a problem, obtaining new differential equations that define the gradient of the original problem's solution.

checkpointing

General trade-off technique, used in adjoint-mode AD, that trades duplicate execution of a part of the program to save some memory space that was used to save intermediate results.


Automatic or Algorithmic Differentiation (AD) differentiates programs. An AD tool takes as input a source program P that, given a vector argument Xn, computes some vector result Y=F(X)m. The AD tool generates a new source program P' that, given the argument X, computes some derivatives of F. The resulting P' reuses the control of P.

For any given control, P is equivalent to a sequence of instructions, which is identified with a composition of vector functions. Thus, if

where each fk is the elementary function implemented by instruction Ik. AD applies the chain rule to obtain derivatives of F. Calling Xk the values of all variables after instruction Ik, i.e. X0=X and Xk=fk(Xk-1), the chain rule gives the Jacobian of F

which can be mechanically written as a sequence of instructions Ik'. Combining the Ik' with the control of P yields P'. This is therefore a piecewise differentiation, which can be generalized to higher level derivatives, Taylor series, etc.

In practice, many applications only need cheaper projections of F'(X) such as:

Adjoint-mode AD turns out to make a very efficient program, at least theoretically [28] . The computation time required for the gradient is only a small multiple of the run-time of P. It is independent from the number of parameters n. In contrast, computing the same gradient with the tangent mode would require running the tangent differentiated program n times.

However, the Xk are required in the inverse of their computation order. If the original program overwrites a part of Xk, the differentiated program must restore Xk before it is used by fk+1'*(Xk). Therefore, the central research problem of adjoint-mode AD is to make the Xk available in reverse order at the cheapest cost, using strategies that combine storage, repeated forward computation from available previous values, or even inverted computation from available later values.

Another research issue is to make the AD model cope with the constant evolution of modern language constructs. From the old days of Fortran77, novelties include pointers and dynamic allocation, modularity, structured data types, objects, vectorial notation and parallel communication. We keep extending our models and tools to handle new constructs.